Tools for Generation of Natural Inflected Language Processors
نویسندگان
چکیده
Supporting multiple languages and natural language processing are of high importance in information systems. This paper discusses software tools for the generation of languages processors (LPs) for the natural inflected languages. The tools are implemented in the LP generator DUAL, which allows for formal specification and reusability of developed components. The declarative language Dual is used to specify words, idioms, and their processing. The paper describes the automatic generation of dictionaries from their specifications in the Dual language and the reusability of software components, which facilitates fast construction of user-oriented software systems for processing of natural inflected languages. The LPs generated are intended for word-for-word translation of domain-specific texts in inflected languages and the generation of frequency lists of words and phrases used in statistical analysis of texts in inflected and analytical languages using Cyrillic or Latin alphabets.
منابع مشابه
Highly-Inflected Language Generation Using Factored Language Models
Statistical language models based on n-gram counts have been shown to successfully replace grammar rules in standard 2-stage (or ‘generate-and-select’) Natural Language Generation (NLG). In highlyinflected languages, however, the amount of training data required to cope with n-gram sparseness may be simply unobtainable, and the benefits of a statistical approach become less obvious. In this wor...
متن کاملMaking Requirements Speciications Accessible via Logic, Language and Graphics: a Progress Report
Natural language software tools may have an important role in making requirements spec-iications more accessible. Possible tools include text processors to support requirements elicitation, and text generators to support requirements validation. The current paper reports on our progress in developing a natural language generation system, integrating this tool with a graphical interface and an a...
متن کاملA Computational Lexicon of Contemporary Hebrew
Computational lexicons are among the most important resources for natural language processing (NLP). Their importance is even greater in languages with rich morphology, where the lexicon is expected to provide morphological analyzers with enough information to enable them to correctly process intricately inflected forms. We describe the Haifa Lexicon of Contemporary Hebrew, the broadest-coverag...
متن کاملRemote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages
Structured, complete inflectional paradigm data exists for very few of the world’s languages, but is crucial to training morphological analysis tools. We present methods inspired by linguistic fieldwork for gathering inflectional paradigm data in a machine-readable, interoperable format from remotely-located speakers of any language. Informants are tasked with completing language-specific parad...
متن کاملImproving the Naturalness and Expressivity of Language Generation for Spanish
We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected usi...
متن کامل